Preserving Micro Data Release: Categorical and Numerical Data

نویسندگان

  • E. POOVAMMAL
  • M. PONNAVAIKKO
چکیده

Data mining techniques, in spite of their benefit in a wide range of applications have also raised threat to privacy and data security. All the attributes in a data base table can be classified into three categories as identifying attributes, sensitive attributes and quasi-identifier attributes. KAnonymity is the popular approach for privacy preserving data mining and the problems with Kanonymity were overcome by the techniques like l-diversity and tcloseness. Even though, all these techniques increase privacy, they account for too much of information loss. Also the computational complexity of these techniques is high. The privacy problem is addressed by fuzzy based approach for numerical attributes and taxonomy tree based mapping table approach for categorical attribute. The Privacy Level, Disclosure Level values given by the individual allows personalized privacy preservation. Also fuzzy treatment to quasi identifier attribute avoids linking attacks. The proposed Fuzzy and mapping table based, privacy preserving publication of data, requires less computational effort but preserve information, compared to perturbation methods and other KAnonymity methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VICUS - A Noise Addition Technique for Categorical Data

Privacy preserving data mining and statistical disclosure control have received a great deal of attention during the last few decades. Existing techniques are generally classified as restriction and data modification. Within data modification techniques noise addition has been one of the most widely studied but has traditionally been applied to numerical values, where the measure of similarity ...

متن کامل

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

Data Clustering and Micro-perturbation for Privacy-Preserving Data Sharing and Analysis

Clustering-based data masking approaches are widely used for privacy-preserving data sharing and data mining. Existing approaches, however, cannot cope with the situation where confidential attributes are categorical. For numeric data, these approaches are also unable to preserve important statistical properties such as variance and covariance of the data. We propose a new approach that handles...

متن کامل

Marginality: a numerical mapping for enhanced treatment of nominal and hierarchical attributes

The purpose of statistical disclosure control (SDC) of microdata, a.k.a. data anonymization or privacy-preserving data mining, is to publish data sets containing the answers of individual respondents in such a way that the respondents corresponding to the released records cannot be re-identified and the released data are analytically useful. SDC methods are either based on masking the original ...

متن کامل

A Survey of Statistical Approaches to Preserving Confidentiality of Contingency Table Entries

In the statistical literature, there has been considerable development of methods of data releases for multivariate categorical data sets, where the releases come in the form of marginal and conditional tables corresponding to subsets of the categorical variables. In this chapter we provide an overview of this methodology and we relate it to the literature on the release of association rules wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009